Goto

Collaborating Authors

 optimal generalizability


On Optimal Generalizability in Parametric Learning

Neural Information Processing Systems

We consider the parametric learning problem, where the objective of the learner is determined by a parametric loss function. Employing empirical risk minimization with possibly regularization, the inferred parameter vector will be biased toward the training samples. Such bias is measured by the cross validation procedure in practice where the data set is partitioned into a training set used for training and a validation set, which is not used in training and is left to measure the out-of-sample performance. A classical cross validation strategy is the leave-one-out cross validation (LOOCV) where one sample is left out for validation and training is done on the rest of the samples that are presented to the learner, and this process is repeated on all of the samples. LOOCV is rarely used in practice due to the high computational complexity. In this paper, we first develop a computationally efficient approximate LOOCV (ALOOCV) and provide theoretical guarantees for its performance. Then we use ALOOCV to provide an optimization algorithm for finding the regularizer in the empirical risk minimization framework. In our numerical experiments, we illustrate the accuracy and efficiency of ALOOCV as well as our proposed framework for the optimization of the regularizer.


Reviews: On Optimal Generalizability in Parametric Learning

Neural Information Processing Systems

This paper proposes an efficiently computable approximation of leave-one-out cross validation for parametric learning problems, as well as an algorithm for jointly learning the regularization parameters and model parameters. These techniques seem novel and widely applicable. The paper starts out clearly written, though maybe some less space could have been spent on laying the groundwork, leaving more room for the later sections where the notation is quite dense. Can you say anything about the comparsion between ALOOCV and LOOCV evaluated on only a subset of the data points (as you mention in l137-140), both in terms of computation cost and approximation accuracy? Other comments: l75: are you referring to PRESS?


On Optimal Generalizability in Parametric Learning

Beirami, Ahmad, Razaviyayn, Meisam, Shahrampour, Shahin, Tarokh, Vahid

Neural Information Processing Systems

We consider the parametric learning problem, where the objective of the learner is determined by a parametric loss function. Employing empirical risk minimization with possibly regularization, the inferred parameter vector will be biased toward the training samples. Such bias is measured by the cross validation procedure in practice where the data set is partitioned into a training set used for training and a validation set, which is not used in training and is left to measure the out-of-sample performance. A classical cross validation strategy is the leave-one-out cross validation (LOOCV) where one sample is left out for validation and training is done on the rest of the samples that are presented to the learner, and this process is repeated on all of the samples. LOOCV is rarely used in practice due to the high computational complexity.